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Abstract 

Multiple change-point detection models assume that the observed data is a 
realization of an independent random process affected by K — 1 abrupt changes, 
called change-points, at some unknown positions. For off-line detection a dynamic 
programming (DP) algorithm retrieves the K — 1 change-points minimizing the 
quadratic loss and reduces the complexity from Q{n K ) to @(Kn 2 ) where n is 
the number of observations. The quadratic complexity in n still restricts the 
use of such an algorithm to small or intermediate values of n. We propose a 
pruned DP algorithm that recovers the optimal solution. We demonstrate that 
at worst the complexity is in 0(Kn 2 ) time and 0{Kn) space and is therefore at 
worst equivalent to the classical DP. We show empirically that the run-time of our 
proposed algorithm is drastically reduced compared to the classical DP algorithm. 
More precisely, our algorithm is able to process a million points in a matter of 
minutes compared to several days with the classical DP algorithm. Moreover, the 
principle of the proposed algorithm can be extended to other convex losses (for 
example the Poisson loss) and as the algorithm process one observation after the 
other it could be adapted for on-line problems. 

1 Introduction 

Segmentation or change-point detection problems arise in many fields, from economet- 
rics [2] to molecular biology [12] . The goal is to partition the signal in K homogeneous 
and contiguous segments of variable length or time. These segments are delimited by 
the K — 1 change-points. These change-points can be identified on-line (sequentially) or 
off-line (retrospectively) [3] . For the off-line detection problem, a dynamic programming 
(DP) algorithm recovers the optimal solution with respect to the quadratic loss in 
Q(Kn 2 ) time and Q(Kn) space ([8], [TT], PQ, [9]). The quadratic complexity in n 
restricts the use of such an algorithm to small or intermediate values of n. 

To get around this problem several approaches have been proposed. These methods 
rely either on efficient heuristics (see [7]) or on small modification of the optimisation 
problem (see [13], [in])- Overall these methods will not find the optimal solution with 
respect to the quadratic loss and in this sense there is a trade-off between run-time and 
introducing some errors. Many other methods can be used for fast segmentation, for 
example wavelets denoising (see for example 0]), or particle filter (see [5], [6]) but they 
do not try to optimize the same criteria. 
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Here we propose a pruned DP algorithm which retrieves the optimal solution for 
large values of n. More precisely, we have a sequence of n observations {^i}ieii,n] 
in a set A (N, K, ■••). We call M.R,t the set of all possible segmentations in K 
segments up to point t (card(M.K,t) = (fell))- We denote r k {m) = [Tfc,7£. +1 [ the 
fc-th segment of segmentation m delimited by change-points r k and r k +x- With the 
convention that T\ — 1 and Trt + i = t + 1, any segmentation m of .M^t is defined as 
{[n, t 2 [, • • • , [tjc, r^ + i[}. Our algorithm recovers: 

r 6 m 

where ^ : A x M — > IR is a convex function of \x. Throughout the paper we will take 
the quadratic loss, j(Yi,fj,) = (Yj — /i) 2 , as a leading example. 

In Section |5J we give a brief overview of the classical DP ( [8] , [H] , [1] ) and give 
the main idea of our algorithm. In Section [3j we describe our proposed pruned DP 
algorithm and in Section [4] we describe a simple execution of the algorithm. In Section 
[5] we demonstrate its worst case complexity. In Section [6j we compare the run-time of 
our pruned DP algorithm and the classical DP on both simulated and real data. 



2 From the classical to the pruned DP algorithms 

For any segment r, we define its cost as g r (fj) = Yli e r^O^'^) anc ^ ^ s optimal cost 
as c r = min^{g r (fi)}. We define C K>t = min {m e M K , t }{ Y^ r em c r }■ Because C K>t is 
segment additive, we retrieve the update equation: 

V t>K C K ,t = min { C K -ij + cu +1 t+ i\\ (1) 

K-l<j<t 

Using update equation Q, the classical DP performs t comparisons at each step and 
therefore retrieves Cx, n +i in Q(Kn 2 ). 

Assume we know the optimal mean value /x* of the last segment, then the problem 
is to minimize H k ,t{^*)'- 

r fc _i(m) 

Hk,t{ff) = min {m e Mkt} { ^ c r + g r . k{m) (fi*)} , 

r=ri (rra) 

Hkj(/i*), as Cx,t, is segment additive and ^mCa**) i s point additive, thus we retrieve 
the update equation: 

Vt>k H k>t+1 (n*)=min(H k!t (fi*), C fc _ 1>t )+ 7 (Y t+1 , fi*) (2) 

Using update equation ([2]), we perform one comparaison at each step and we retrieve 
in 0(n). Most of the time we do not know fi*. A simple heuristic is to test 
a large but finite set of possible fi* denoted here {^p}pe[p,,P]- Using update equation ^ 
on all Hk,t{^V)-, we perform P comparaisons at each step and we retrieve all H^n+iif-C,) 
in 0{Pnj. 

To recover a solution as close as possible to the optimal one, we can increase the 
number of tested values and for each of these /j* store an optimal last change-point. 
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Intuitively, two close values of //*, n* pi and /x* 2 , probably share the same optimal last 
change-points. In other words the heuristic is storing too much information and it seems 
that the heuristic should stores only critical values of a* corresponding to a change in 
the last optimal change-point. Our pruned DP algorithm is build on this idea. 

3 Algorithm 

We define H kjt as: 

r k -i(m) 

H k M = min {m e Mk t} { c r + l(Yi, a) } 

r=ri(m) «Grj,(m) 

and h ku i(ij) as: 

t 

V t' < t h k ,t,v{p) = Cfc-i,* + 7(*i>A0» 

i=t'+l 

where /i^^f represents the cost of the best candidate segmentation in segments having 
its last change-point at position t' and a last mean value of /i. We have: H kt (ii) = 
min{ t > e [it,t-i]}{ h ktt ,t'(fJ>) }• We define Sfc^e as the set of u such that a last change-point 
at t' is optimal: 

and the set 7 fcjtjt / of a such that a last change at t' is better than a change at t: 

h,t,t r = { | hk t t,t'(n) < Cfc-i,t } 

Here we review the key properties of h kjt ,t' and Sfc^f that allow their simple con- 
struction in the course of the proposed pruned DP algorithm. 

Proposition 3.1 

Additivity V t' < t h Kt+ i, t '{lA = h k ,t,v(p) + l( Y t+i, A*) 

Stability h kUo (ij) > h kUl {n) V t* > t h k , t * ito (ii) > h krM (n) 
Pruning 

V t>t' >k, S k ,t+i,t' = S k ,t,t' H I k ,t,t' (3) 

S k ,t,t> = ® Vt*>i S w = (4) 

V t' > k, Sk,f,t' = Cffi(U t6 [ fe)t '_ij4 At ') (5) 

Storage lk,t,t' i & an interval and S k ,t,t' i s a finite union of intervals. 

In other words, if Sfc^f is empty we are certain that, whatever the observations after t, 
the best segmentation does not have a change at t'. The proofs of these properties are 
simple and left to the reader. 

Using these propositions our pruned DP algorithm stores a list, UstOfCandidates k , 
of candidate last change-points t', update for each t' a cost function Cost k;t ' and a set 
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of winning intervals Set k y (see proposition 3.1 -Storage). Cost kjt > and Setk,t> store the 
successive hk,t,v an d Sk,t,#- 

In the case of the quadratic loss, 7(F i ,/x) = (Y^ — /i) 2 , Cost k>t > is stored as a second 
degree polynomial function of //. It is initialized when arriving at observation t' as 
Costkj' '■= Cfc-i.t'- Costal is updated with proposition 3.1 -Additivity. 

Setk t t> is stored as a list of intervals (see proposition 3.1 -Storage). It is initialized 
when arriving at observation t' as Set kyt i '■= D. In the case of the quadratic loss, D can 
be chosen as [mini(Yi), max^Yj)]. Set kyt i is updated using proposition 3.1 -Pruning ^ 
and (|5j and a candidate change-point is pruned as soon as Set kyt i = using proposition 
3. If Pruning 
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The algorithm is described in more details in Algorithm [T] for a given k > 2. 
Algorithm [T] should be iterated for all k up to K. For = 1, all Ci tt are computed 
in 0(n) as in the classical DP algorithm. Importantly, enough the algorithm could be 
modified for on-line problems as the observations are used one after the other. 



Algorithm 1 Pruned DP algorithm for segmentation 
UstOfCandidates k := {k — 1} 
Costk^k-i '■= Cfc-i,fc-i ; Setk y t> '■= D 
for t e [fc..n] do 

Cost ktt := C k -i tt ; Set k)t := £) 
for / G UstOf Candidates k do 
Cost fc ,i := CW M + 7(F t+ i, .) 
J = {// | Cost kt i(/i) < C fc -i,t} 
S'etfc^ := 5et fe) / fl J 
if Setfc^ = then 

HstOfCandidates k := UstOfCandidates k \ {/} 
end if 

S'etfc^ := Set k , t \ I 
end for 

if Set k>t ^ then 

HstOfCandidates k := UstOfCandidates k U {t} 
end if 

= miniiciistofCandidates^imin^ (Cost ki i)} 
end for 



4 An example 

Here we illustrate the different steps of the proposed DP algorithm with a four point 
signal and for k = 2 segments. These four points are respectively 0, 0.5, 0.4, —0.5 (see 
Figure [l] top left). 

Step 1 The only two segments segmentation up to point t = 2 has a break after point 
1. Its cost is h 2 .2,i{^*) = Ci ; i + (0.5-/i*) 2 = + (0.5-/i*) 2 and is stored in CW 2 ,i- The 
cost depends on jj* and is shown in Figure [l] (top middle). S , et 2 ,i is set to [—0.5,0.5]. 
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-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 

a 

Figure 1: Four points signal (top left); Successive stored candidate functions h 2i . t . as 

a function of /x*. h 2 ^,i (top middle), h 2 ^ 2t \ (— ) and h 2 ,2,2 ( ) (top right), h.2,3,1 (— ) 

and ( ) (bottom left), h 2 ,3,i (—), ^2,3,2 ( ) and h 2 ,3,3 (• * • ) (bottom middle), 

^2,4,1 ( — ) and ^2,4,3 (" " " ) (bottom right). 

Step 2 If we go up to point t = 3 an other solution is a break just after 2. Its cost 
is h 222 (n*) = Ci,2 h is stored in object Cost 2 ,i and is compared to h 2j2 ,i (see Figure [I] 
top right) too retrieve Set 2) \ = [0.5 — ^7^, 0.5] and Set 2j2 = [—0.5,0.5 — Then we 

use (0.4 — /i*) 3 to update Cosi 2 ,i and Cost 2i2 to retrieve ^2,3,1 and ^2,3,2 (see Figure [l] 
bottom left). 

Step 3 If we go up to point t = 4 an other solution is a break just after 3. Its cost 
is ^2,3,3(a**) = Ci,3, it is stored in object Cost 2j \ and then compared to ^2,3,1 and h 2 ,z,2 
to retrieve Set 2 ,i — [Jjj — J^, 0.5], Set 22 = and Set 2>3 = [—0.5, — ^] . It can be 
seen that ft.2,3,2 (see Figure]!] bottom middle) is beaten for any // thus the pruned DP 
algorithm discard ^2,3,2 • Thus we already know that the best segmentation will not have 
a break between 2 and 3. Then we use (—0.5 — ji*) 2 to update Cost 2 $ and Cost 2 ^ to 
retrieve ^2,4,1 and /i2,4,3 (see Figure [I] bottom right). We see that the best segmentation 
is obtain for /j* = —0.5 and a break after point 3. 

5 Worst case complexity 

Here, we demonstrate that, if the 7 function is convex the complexity of our proposed 
algorithm is bounded in 0(Kn 2 ) time and if the cost function can be written using a 
base the complexity is bounded in O(Kn) space. 

For a given number of segments k, at step i of the algorithm, at most % change-point 
candidates need to be updated. For each candidate change-point t the algorithm will: 

• update the cost function Cost^t', 
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compute its minimum value ; 

compute the roots of Costk,t(fi) = Ck-i,i to get / = {fi\Costk t t(n) < Cfc-i,*} ; 
update the winning intervals Setkt- 



All these steps are in 0(1) except the update of intervals. Theorem A. 3 demonstrated 
in the appendix, show that if there are % candidates then there are at most 2% — 1 
intervals. Thus, the time complexity is bounded by ^Li(0(3z) + 0(2i — 1)) ~ 0{n 2 ). 
As this should be done for all k < K we retrieve a worst case complexity of 0(Kn 2 ). 

Assuming that the cost function can be stored efficiently using a base of I functions, 
the algorithm will store the optimal change-point for each t in O(Kn), the candidate 
cost function for each t in 0(£n) and the set of intervals in 0(2n — 1). Thus if I is small 
compared to K, we retrieve an O(Kri) space complexity. 

For j(Y t , n) = (Y t — jj) 2 the worst case complexity, in 0(Kn 2 ), is achieved for the 
sequence Y^ — i. If the signal is constant, the complexity is in O(Kn) as the algorithm 
will store only one candidate change-point at each step. Thus it can be hoped that for 
more or less piecewise constant signal the complexity will be closer to O(Kn) than to 
0(Kn 2 ). 



6 Empirical complexity 

In this section, we assessed the efficiency of the pruned DP algorithm on simulated data 
set and real data set for 7(1*, fi) = (Y t — fi) 2 . The algorithm was implemented in C++ 
and was run on a 3.16GHz Intel(R) Xeon X5460. 



6.1 Simulated data 

We simulated sequences using a constant, sinusoid or rectangular signal. For the 
sinusoid and rectangular waves we consider various amplitudes and frequencies. We 
considered a Gaussian noise of variance 1, a uniform noise of variance 1, a chi-square 
noise of variance 1 and a Cauchy noise. 

We first compare the pruned DP to the classical DP algorithm for relatively small 
sample sizes, n < 16. 10 3 and checked that the two algorithms retrieve the same result. 
Then we evaluate the run-time of the pruned DP algorithm to process large signals, 
n = 10 6 . Finally, we compute at each step of the algorithm the maximum number of 
intervals stored. 

The comparaison with the classical DP is summarized in Figure [2| The mean 
number of intervals stored by the pruned DP algorithm is summarized in Figure [3] left 
and middle. The empirical run-time for large signals of 10 6 points in 50 segments was 
around 80-250 seconds. The shorter run-times were achieved for a cauchy noise with 
a run-time close to 80 seconds. The longer run-time were achieved for sine wave with 
a run-times around 240 — 250 seconds. Non sine wave signal had a run-time around 
100-120 seconds. Mean run-times for the different tested signals will be summarized in 
table 1 (supplementary materials). Moreover the codes used for these simulations will 
be provided in supplementary materials. 
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Figure 2: Mean run-time of the classical DP (♦) and pruned DP algorithm (•) for 
simulated constant signal with a normal noise. For sequences up to 16. 10 3 points (left) 
and for sequences up to 10 6 points (right) 





Figure 3: Maximum number of intervals stored by the pruned DP algorithm at each 
step of the algorithm for k = 2. For a 100 simulated constant signal of 10 6 points with 
a normal noise of variance l(left), for a 100 simulated sine wave signal of 10 6 points 
with a normal noise of variance 1 (middle), for 19 CNV profiles of 10 6 points (right). 



6.2 Real data 

We download the publicly available GSE17359 project from the GEO database. This 
data set is composed of 18 SNP (single Nucleotide Polymorphism) array experiments. 
SNP arrays enable the study of DNA copy number gains and losses along the genome. 
For this kind of data a multiple change-point detection procedure based on the quadratic 
loss is often used ([T2]). For each SNP array experiment there are two signals of a 
million points (CNV and SNP) that were analyzed separately. The mean run-time 
were respectively of 101 (CNV) and 103 (SNP) seconds. The maximum number of 
intervals stored by the pruned DP algorithm is summarized in Figure [3] right. 



Overall it can be seen that the pruned DP algorithm recovers the optimal solution 
in a drastically reduced amount of time compared to the classical DP algorithm. This 
can be explain by the very small number of candidates stored by the algorithm (see 
Figure [3]). More importantly it is able to retrieve this solution in a reasonable amount 
of time even for large signals and can be used in practise for example to analyse large 
SNP profiles. 
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A Lemma and Proofs 

In this section we demonstrate that if there are t candidate change-points, then there 
are at most 2t — 1 intervals. This can be demonstrated for a relatively large class of 
convex functions. 

A.l B functions 

Definition A. 1.1 Let B n denote the set of all functions B : R — >■ R such that 

n+1 

V [i e R, B(fi) = min {t e [ljn]} {u B ,t + 

j=t+i 

where all u B ,t are rea l numbers and all f B j are convex functions of /i. Note that B n C 
B n+1 . LetB = {JB n . 
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Definition A. 1.2 For any B in B n and A a subset of [1, nj we define the function B A 
as 

n+l 

B A (n) = min {t e A} {u B ,t + /-Bj(aO} 

j=t+i 

Proposition A.l B A G B card ( A ) 

It is easily shown for A = \ {i} with i in [l,n] and thus by induction it is true 

for any A. 

Definition A. 1.3 The rank of a function B G B is 11(B) = min{n G N* \B G B n } 

A. 2 Decomposition in intervals and order of B 

Definition A. 2.1 LetX be a partition ofR in a finite set of intervals X = {Ij}{je[i,k}}- 
X is a k- decomposition of a function B G B if 

V I j} 3 i, \/x G Ij Bi(x) = B(x) 

The set of all B functions with a k- decomposition is denoted B h . Similarly the set of 
all B n functions with a k- decomposition is denoted B\. 

Proposition A. 2 If B is B then there exists a k- decomposition of B. 

Definition A. 2. 2 The order 0(B) of a B function is min{k eW\B G B k } 

Theorem A.3 For all B e B, we have 0(B) < 2 x TZ(B) - 1. 

Proof We demonstrate this theorem by induction. It is true if 0(B) = 1. Assume it 
is true for any B with 0(B) = n. Let B be B with 0(B) — n + l. We have: 

V/*el, B(fi) =mm{5 [1) „](/i),B{„ +1 j(/i)} 

B(n) =min{C(fi), u B , n +i} + fB,n+2(v), 

where C G B n : 

n+l 

c(fi) = {u Byt + IbM) 
j=t+i 

Let X = Uje[i fe] I j be the smallest set of intervals such that: 

V Ij G X, V \i G Ij u B)n+ i > C(fl) 

Let A k be the subset of [l,n] defined as {i\ 3 x G Ik C^(x) < u Bj7l+1 }. As 
IZ(B) — n + l and as for all % in [l,n] is convex, there exists a unique j such that 
i G Aj and therefore Y^j=i card(Aj) = n. 

In each interval Jfc, we have C(fi) = C Ak (n). By induction, 0(C Ak ) < 2xcard(A k ) — 

1. 

Overall, for any B, we have: 

k 

0(B) < J2 k j= i 0(C Ak ) + (k + 1) < 2 ^ card(A k ) - k + (k + 1) <2n + l. 

3=1 
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